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Abstract 

The graphical lasso (glasso) is a widely-used fast algorithm for estimating sparse 
inverse covariance matrices. The glasso solves an ii penalized maximum likelihood 
problem and is available as an R library on CRAN. The output from the glasso, 
a regularized covariance matrix estimate t^giasso and a sparse inverse covariance 
matrix estimate flgiasso, not only identify a graphical model but can also serve as 
intermediate inputs into multivariate procedures such as PCA, LDA, MANOVA, 
and others. The glasso indeed produces a covariance matrix estimate S^iasso which 
solves the £i penalized optimization problem in a dual sense; however, the method 
for producing Clgiasso after this optimization is inexact and may produce asymmet- 
ric estimates. This problem is exacerbated when the amount of ii regularization 
that is applied is small, which in turn is more likely to occur if the true underlying 
inverse covariance matrix is not sparse. The lack of symmetry can potentially 
have consequences. First, it implies that ^gi^sgg 7^ ^giasso and second, asymmetry 
can possibly lead to negative or complex eigenvalues, rendering many multivariate 
procedures which may depend on Clgiasso unusable. We demonstrate this problem, 
explain its causes, and propose possible remedies. 

Keywords: Concentration model selection, glasso. Graphical Gaussian Models, 
graphical lasso, ii regularization. 



1. Introduction 

In modern applications, many data sets are simultaneously high-dimensional 
and low in sample size. Classic examples include microarray gene expression and 
SNP data. Dealing with such datasets has become an area of great interest in many 
fields such as biostatistics. Algorithms such as the graphical lasso (Friedman et al., 
2008; Hastie et al., 2009) have been proposed to obtain regularized covariance 
estimators in the n p setting (where n is the sample size and p is the problem 
dimension) as well as perform graphical model selection. 

In the case of the graphical lasso, graphical model selection involves inferring 
a concentration graph (or equivalently, a Markov model). A concentration graph 
encodes zeros in the inverse covariance (concentration) matrix, i.e., z 7^ j for i,j G 
{1, . . . ,p} in the graph implies that the partial correlation p (Xj, Aj|Afc^{j j}) = 0. 
Along with inferring such a graph, the glasso provides p x p dimensional matrix 
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estimators for both the covariance and concentration matrices, denoted Sa and Clx 
respectively, for a given penalty parameter A > 0. In particular, Cl\ is the solution 
to the convex maximization problem 



y-1 



argmin [loEfdetfX) 



tY{SX) - x\\x\ 



where S is the sample covariance matrix, X = {xij}^^^^ is positive definite and 

= Ylij The non-zero elements of Qx correspond to edges in the esti- 
mated concentration graph. 

In some applications, graphical model selection is the primary goal, where in 
other situations the estimators Sa and Cl\ are used as inputs into other multivariate 
algorithms where a regularized covariance estimator is required. Typical examples 
include LDA, PCA, and MANOVA. Hence, it is often necessary that not only 
tl = txy 0, but also that = (ix, Clx ^ 0, and fi^^ = tx- We find that the 
output of the graphical lasso does not meet these conditions in certain situations, 
explain why, and discuss how to solve this problem. Such situations arise primarily 
when 5* is rank-deficient and A is small. A low level of regularization is required 
when the true underlying concentration matrix is not sparse. It should however be 
noted that the glasso algorithm does indeed solve the dual problem corresponding 
to (1), so the above assertions should be interpreted in context. 



2. Motivating Examples 

We now present two motivating examples, one in a classical setting and another 
in a high-dimensional setting, to illustrate the problem. 

2.1. Example 1: Low dimensional, large sample size inverse covariance estimation 
Consider n = 500 i.i.d. samples drawn from a p = 5 dimensional multivariate 
Gaussian distribution with mean = and concentration matrix: 
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The glasso algorithm was applied to this data set. A regularization parame- 
ter of A = 0.0033, which is close to the cross-validated estimate, was chosen to 
demonstrate the problem. The glasso estimators for Q and S = for a given A 
are denoted fix and Sa. 

For reasons which are clarified in Section 3, the glasso produces estimators 
which are neither symmetric nor true inverses of one another, i.e., Clj^ ^ fix and 
7^ f^A- To quantify the lack of symmetry, consider the matrix of relative errors 



between the elements of fix and as defined by Err, 
For the numerical example above. 
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with the convention that if Clx{i,j) = = then Ervij = 0. Note that 

the entries Err^^2 = 100% and Err2^5 = oo occur because Clx{5,2) ^ while 
Clx{2,5) = 0. 

Although the relative errors are small, i.e., on the order of 2%, there is a 
clear lack of symmetry in Clx and moreover the sparsity patterns in the upper and 
lower parts of Clx are different, and thus yield two different graphical models. In 
particular, Clx (5, 2) 7^ which indicates an edge between variables 2 and 5, while 
^2^(2,5) = indicates the absence of such. Furthermore, in high-dimensional 

examples, a graph is often calculated automatically when ( ^^a ) > e for some 

small e. In such lack of symmetry may result, yielding two separate 

graphs. 

2.2. Example 2: High dimensional, low sample size autoregressive model 

The lack of symmetry in Clx, and the resulting difference in the concentration 
graphs corresponding to the upper and lower parts of Clx, often becomes more 
pronounced as the dimension p grows. 

We now consider a high dimensional example with n = 250 i.i.d. samples 
drawn from a Gaussian AR(1) model such that X^+i = (pXt + for t = 2, ...,p 

and Xi = ei. Here, p = 500, (j) = 0.75, and ej A/'(0, 1), t = 1, . . . ,p. The 
concentration matrix Q is tridiagonal, with the diagonal entries equal to 1 and the 
off-diagonal entries equal to —0.75. 

Given a glasso estimator Clx, let Ei and E2 denote the edge sets corresponding 
to the upper and lower halves of ^a, respectively. Then the symmetric difference 
|£'iA£'2| is the number of edges which are present in the concentration graph 
encoded by one half of Clx but not in the graph encoded by the other half. 

The glasso algorithm was applied to samples from the above model with the 
regularization parameter A taking values between 0.001 and 0.03 in increments 
of 0.001. To put these values in perspective, note that when A = 0.03, 102,278 
out of 124, 750 (82%) of the estimated off-diagonal entries were 0. The number of 
edge differences \EiAE2\ corresponding to Clx as A varies between 0.001 and 0.03 
is shown in Figure 1. 

Figure 1: \EiAE2\ vs. A for an AR{1) model with 4> = 0.75, p = 500, and n = 250. The red 
dashed Hne is at |£^iA£:2| = 20. 
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Note that at small values of A, the difference in the graphs corresponding to 
the upper and lower parts of Clx as denoted by \EiAE2\ can be substantial. Hence, 
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the lack of symmetry in Clx can result in two completely different graphical mod- 
els. Moreover, although|£'iAi?2| decreases as A increases it nevertheless remains 
nonzero as regularization increases. 

2.3. Consequences of asymmetry in the glasso concentration matrix estimator 

Users of the glasso may find the lack of symmetry a problem for a number of 
reasons: 

1. not a mathematically valid estimator for Vt., since Vt^ ^ tl\ and Vt\ ^ 
■ 

2. There is no guarantee that Qx has real positive eigenvalues. If it has negative 
or complex eigenvalues, many multivariate procedures such as LDA and PCA 
may not be well-defined. 

3. There may be differences between the edge sets of the concentration graphs 
corresponding to the respective upper and lower halves of Clx- 

We examine the causes of the lack of symmetry in Section 3 and suggest possible 
remedies in Section 4. 

3. Cause of Asymmetry in the Glasso Concentration Matrix Estimator 

The glasso algorithm, taken directly from Hastie et al. (2009), is shown in 
Algorithm 1. For further details concerning the glasso and its convergence, see 
Friedman et al. (2008) and Hastie et al. (2009). In Algorithm 1, is the sample 
covariance matrix, A is the glasso penalty parameter, and 14^ is a matrix on which 
the glasso iterates. In Step 2 of Algorithm 1, Wu refers to the submatrix of W 
without its j*^ row and column, and Su is the j^^ column of the sample covariance 
matrix without the diagonal element Sjj. In Step 3 of Algorithm 1, 9i2 for a 
given j is the j^^ column of the matrix without Qjj. Upon termination of the 
algorithm, the current iterate W is set to Sa and 6 is set to Clx, and referred to 
as the glasso estimators. 



Algorithm 1 The glasso, exactly as it appears on p. 636 of Hastie et al. (2009). 



1. 


Initialize W = S + Al. The diagonal of W remains unchanged in 




what follows. 


2. 


Repeat for j = 1, 2, . . . p, 1, 2, . . . p, . . . until convergence: 




(a) Partition the matrix W into part 1: all but the jth row and 




column, and part 2: the jth row and column. 




(b) Solve the estimating equations Wu/? — Si2 + A • Sign(/3) = 




using the cyclical coordinate-descent algorithm (17.26) for the 




modified lasso. 




(c) Update ■Wi2 = Wii/3 


3. 


In the final cycle (for each j) solve for §12 = — /3 • §22, with 1/^22 = 




W22 - vj'[2P. 
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3.1. Construction of Clx in the glasso 

The glasso iteratively updates a matrix W which converges numerically to 
Sa; the glasso estimator for the population covariance matrix E. In contrast, the 
estimator Vt\ for the precision matrix VL is constructed only upon convergence, i.e., 
only after the algorithm terminates. As we shall show below, the process by which 
VL\ is constructed avoids inversion but is however mathematically inexact in the 
sense that it leads to Vl\ 7^ VL\ and (V^^ 7^ • If the graph encoded 

by the glasso output ^a may be different from the graph encoded by Sa^- This 
problem was illustrated in the two motivating examples above. 

Step 2 of Algorithm 1 involves an inner loop in which row/column 1, ... ,p of 
W are sequentially updated. For one full inner loop over the p rows and columns 
of VF, let the p successive estimates be denoted VF'-*-* for i = 1, . . . ,p. Exactly one 
row and column of W^^^ is updated using a lasso coefficient /3^^^ {(3 of Step 2 in 
Algorithm 1). 

We now introduce additional notation in order to illustrate the problems en- 
countered when the glasso constructs an estimate of the concentration matrix 
(recall that this takes place upon termination of the glasso algorithm). Consider 
once more W^'^^ for z = 1, . . . Define 9^*^ = (VF^*^) ^ and ^ to be the {p — 1) 
vector consisting of the i*^ column of B*-*^ excluding the diagonal entry o'f^ . De- 
fine ff^lj and w^^^ be the corresponding elements of M^^'^and let W^j _^ be the 
j^th principal minor of W^^\ Then using the fact that G*^*-* = (W^^^) ^, there is a 
closed-form expression for 6^1^ and 6*^] ^ in terms of Su, w^ll^, and 



6. = -/3«^!?, ,T ■ (2) 

W:/ — [W 



-1,1 



/3W 



When the glasso terminates, it sets Sa = W^^^ and uses (2) to compute 
l^'l*-', 6'^] 'I , which are taken as the columns of fix. This procedure has a com- 
plexity of O {p^) and is therefore more efficient than direct numerical inversion. 

3.2. Cause of asymmetry in Vt\ 

The glasso terminates when W converges numerically, and constructs CL\ from 
-se a. o^e .o. p..„.s ..... ... 

loop, thus avoiding the need to invert W^^\ However, while |^^p,p, | is equal 
to the the pth. row and column of 0*-^^ = (VF^^-*) ^ = ^A^^iy construction, the 
I ^ii J I . '^ot equal to the i^^ row and column of (VT*^^^) ^. Instead, by 

construction each |6'j-*\ is equal to the i^^ row and column of (VT^*^) ^ 7^ 

{W^p)^ ^. Asymmetry occurs because the quantities 1 6'- 6*1*1^1 are taken as 
the columns of CL\. 

The discrepancy between the above set of estimates may not be minimal even 
if the iterates M^^*^ are approximately equal. Another way of stating this problem 
is that convergence of the W^'''> to a specified tolerance does not necessarily imply 
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convergence of(H^») to any given tolerance. The result is that while the glasso 
covariance estimator Sa satisfies (1), Clx does not, leading to the aforementioned 
problems. The problem is exacerbated when the penalty parameter A is small and 
S is close to rank-deficient (which is the case when n <^ p). The following lemma 
formalizes this assertion. 

Lemma 1. If S is rank- deficient, the maximum absolute value of the entries of 
Clx diverges as X 0. 



Proof. See the appendix. 



□ 



Lemma (1) suggests that convergence of the inverse glasso iterates to 
some small, fixed tolerance may require a radically small tolerance criterion for 
the convergence of the glasso iterates W. Indeed, it is easy to construct such 
examples. Consider the rank-deficient sample covariance matrix S shown below 
alongside the optimal solution to (1) corresponding to a regularization parameter 
of A = 10-6; 
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(3) 



Moreover, consider a matrix iterate Wt which is close to Sa, given as follows 



1 + 10-6 
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(4) 



The supremum-norm errors on the dual and primal for this Wt are given respec- 
tively by 



Wt 



t~HO~^ and 



Wf^ - tlx 



t 



-lo^ 



(5) 



and are several orders of magnitude apart. This example demonstrates why de- 
creasing the convergence tolerance on the dual iterates W may not always be a 
feasible solution to the asymmetry problem discussed here. 

To summarize, the method for inversion used during the final step of the glasso 
algorithm for computing Clx is mathematically inexact, and the resulting error is 
exacerbated when p > n with an insufficiently large penalty parameter A. In 
this case, the £i-penalized inverse covariance estimator is unreliable as A — )■ 0, as 
described by Lemma 1 Therefore the use of an overly small A should be avoided; 
however, in practice, choosing the penalty parameter A can be challenging. For 
example, choosing A via cross-validation using Clx tends to yield overly small A, 
which may produce dense and possibly ill-conditioned estimates for Clx- One 
possible indicator of too little regularization is when the number of neighbors of 
each variable/node is too high. A second possible indicator is if there is a serious 
lack of symmetry in the glasso estimates. A third possible indicator is when the 
condition number of the resulting estimate is too high. Recently, Won et al. (2012) 
provide impetus for constraining the condition number of the covariance matrix; 
in light of that work, the condition number can perhaps be used as a guide in 
choosing A. 
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4. Enforcing Symmetry on the Glasso Concentration Matrix Estimator 

The glasso covariance estimator f^x is the true numeric minimum of the glasso 
problem (1) and thus a valid ii regularized estimator for the true population 
covariance matrix S. However, as previously demonstrated, the glasso estimator 

is asymmetric, and (i^^ ^ Sa- 

In some settings, it may be desirable to resolve one or both of the aforemen- 
tioned issues. For f^A to encode a sparse concentration graph, its sparsity pattern 
must be symmetric. Moreover, if Clx is to be used as a sparse concentration ma- 
trix estimator, it is necessary that Clj^ = fix for it to be a valid estimator. Most 
importantly, it may be required that fix = ^ in order for it to be usable in 
multivariate procedures. 

We propose three simple approaches which address some or all of the above 
requirements. 

1. Numerical inversion. To have (ix = ^x'^, it is necessary to directly 
invert Sa- This inversion maintains the sparsity pattern of (^a (although as 
a consequence of numerical error there may be negligible entries in place 
of zeroes). The 0{p^) complexity of numerical inversion (vs. 0{p^) for 
the current glasso approach) does not represent a difficulty for matrices 
of the dimension which the glasso is currently able to solve (up to p ^ 
2000 on a typical desktop), as it also needs to be done only once at the 
end of the glasso iteration. Note that the inverted matrix should be 
hard thresholded to eliminate small non-zero entries introduced by numerical 
error, and the resulting inverse covariance matrix then should be checked for 
positive definiteness. However, numerical inversion may not be a viable 
option when Sa is ill-conditioned, as is the case when S is rank-deficient and 
A is small. In such cases, it may be useful to exercise caution when using fix 
in further calculations. 

2. Modified glasso output. The upper right triangle of ^a can be taken as 
the correct estimate. The entries corresponding to the upper right triangle 
are more recent updates than those in the lower left triangle, since the glasso 

inserts the | ^f^] j , | into the columns of fix- The resulting estimator 

will not equal Sa^, but it is symmetric. It will not solve the primal problem 
in (1) exactly. 

3. Iterative proportional fitting (IPF). IPF (Speed and Kiiveri, 1986) can 
be used to simultaneously compute the maximum likelihood estimates for 
fl and S under an assumed concentration graph, i.e., sparsity pattern in fl. 
One approach is to use the sparsity pattern from the upper right triangle 
of fix, enforce symmetry, and then use IPF to obtain E and fl. The esti- 
mator fl will refiect the sparsity structure corresponding to fix, and satisfy 
fl = at each iteration of IPF. Note that neither fl nor S will be solu- 
tions to (1) or (6), respectively. Furthermore, the computational complexity 
of IPF is 0{c^), where c is the size of the largest maximal clique of the graph 
implied by fix- Therefore, IPF does not imply relatively higher computa- 
tional costs, although it does require identifying the maximal cliques which 
is well-known to be NP-complete. Finding the maximal cliques can however 
be avoided if a modification of the glasso algorithm is used to estimate an 
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undirected Gaussian graphical model with known structure (see Algorithm 
17.1 in Hastie et al. (2009)). 

Table 1 summarizes the properties and tradeoffs of each of the proposed solu- 
tions. 



Table 1: Comparison of possible estimators. 
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5. Conclusions 

In this note we demonstrated that the estimators from the widely used R 
package glasso may be asymmetric when the amount of regularization applied is 
small. This could cause problems when the glasso estimators are used as inputs 
to other multivariate procedures, and additionally because the sparsity structure 
of the glasso estimators may themselves be asymmetric. It may be helpful for 
users of the package glasso to be aware of this, as the estimator can be easily 
corrected by one of the outlined methods. Of these, numerical inversion followed 
by thresholding may be the simplest and most effective fix. The root cause of 
the issue is that the glasso algorithm operates on the dual of (1), and constructs 
the primal estimator, Clx, only after the dual optimization completes. If a sparse 
concentration estimator is sought, it may be more natural to operate off the primal 
problem (1), though the glasso is more popular in practice. Methods for solving 
the primal (1) have been recently considered, among others see Maleki et al. (2010) 
and Mazumder and Agarwal (2011). This short note avoids recourse to the primal 
by identifying problems with the dual approach, and consequently explores ways 
in which these can be easily rectified so that the popular dual approach can be 
retained. 
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Appendix 

Proof. Consider the dual of (1) as given in Banerjee et al. (2008): 



Sa = argmin [logdet (X)] 
s.t. max \xij — Sij\ < A 



(6) 



where max. 



\mij\ is the supremum norm, the maximum absolute value entry of 
the matrix M. From (6), it is clear that Sa — in the supremum norm as A — )■ 0, 
though at A = the primal problem (1) does not necessarily have a solution. 
Convergence in sup-norm gives convergence of Sa — !■ in any other operator norm 
II •11^. In particular, invoking the continuity of eigenvalues, Amin {^>^ ^ Amin ('S') 

as A — )■ 0, with Amin {M) defined as the smallest eigenvalue of the square matrix 
M. Considering the operator 2-norm and cxD-norm of gives: 



max 



max 



> 



P 



-1 



> p 



-1 



P Amax (^X 



P 



>min I 



-1 



A^O 



P ^ [Amin (S)] ^ 



In the sample-deficient case n <^ p, Amin (S) = almost surely, and therefore fix 
diverges with respect to the supremum norm as A — )• 0. □ 
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